Query Word Image based Retrieval Scheme for Handwritten Tamil Documents

نویسندگان

  • A N. Sigappi
  • S. Palanivel
چکیده

This paper brings out an autoassociative neural network (AANN) based information retrieval mechanism to locate handwritten documents from a literary collection in Tamil language corresponding to query word images. The strategy extends to create models for the chosen search word images, evolve a methodology to identify the search word and subsequently retrieve the relevant documents. AANN emphasises a training procedure through an appropriate combination of units in the layers of the network to arrive at a suitable model for each word in the vocabulary. The training phase orients to segment the digitized text documents into lines and words, extract profile and moment based features from the words and articulate an index of words. The features computed based on the intensity values of the pixels cater to accrue the nuances of the strokes in the characters. The experimental results obtained for an index of words elaborate the astuteness of the scheme and its retrieval accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Handwritten Document Retrieval System for Tamil Language

The paper attempts to create a handwritten document retrieval system suitable for Tamil language, with a view to record traditional literature content for future reference. It projects a search mechanism to access the query word images using a statistical model based methodology. The scheme revolves around a well defined procedure which results in word models from where the search word can be r...

متن کامل

Tamil to English Cross Lingual Information Retrieval System for Agricultural Domain Using VSM

Language processing is prompt research area across the country. In that, query translation is one of the major areas of research for the past ten decades. Tamil is morphologically rich and complex language. The suitable morphological processing is very important for Cross Lingual Information Retrieval (CLIR). The contributions towards Tamil to English query translation and transliteration are l...

متن کامل

Script Independent Word Spotting in Multilingual Documents

This paper describes a method for script independent word spotting in multilingual handwritten and machine printed documents. The system accepts a query in the form of text from the user and returns a ranked list of word images from document image corpus based on similarity with the query word. The system is divided into two main components. The first component known as Indexer, performs indexi...

متن کامل

Content-based Information Retrieval from Handwritten Documents

This paper is about retrieving the closest matches from a set of scanned handwritten documents based on a query that is a document image. System indexing and retrieval is based on writer characteristics, textual content as well as document meta data such as writer profile. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that descri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012